AITopics | discrete cosine transform

Collaborating Authors

discrete cosine transform

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Accurate Thyroid Cancer Classification using a Novel Binary Pattern Driven Local Discrete Cosine Transform Descriptor

Saini, Saurabh, Ahuja, Kapil, Steinbach, Marc C., Wick, Thomas

arXiv.org Artificial IntelligenceSep-23-2025

In this study, we develop a new CAD system for accurate thyroid cancer classification with emphasis on feature extraction. Prior studies have shown that thyroid texture is important for segregating the thyroid ultrasound images into different classes. Based upon our experience with breast cancer classification, we first conjuncture that the Discrete Cosine Transform (DCT) is the best descriptor for capturing textural features. Thyroid ultrasound images are particularly challenging as the gland is surrounded by multiple complex anatomical structures leading to variations in tissue density. Hence, we second conjuncture the importance of localization and propose that the Local DCT (LDCT) descriptor captures the textural features best in this context. Another disadvantage of complex anatomy around the thyroid gland is scattering of ultrasound waves resulting in noisy and unclear textures. Hence, we third conjuncture that one image descriptor is not enough to fully capture the textural features and propose the integration of another popular texture capturing descriptor (Improved Local Binary Pattern, ILBP) with LDCT. ILBP is known to be noise resilient as well. We term our novel descriptor as Binary Pattern Driven Local Discrete Cosine Transform (BPD-LDCT). Final classification is carried out using a non-linear SVM. The proposed CAD system is evaluated on the only two publicly available thyroid cancer datasets, namely TDID and AUITD. The evaluation is conducted in two stages. In Stage I, thyroid nodules are categorized as benign or malignant. In Stage II, the malignant cases are further sub-classified into TI-RADS (4) and TI-RADS (5). For Stage I classification, our proposed model demonstrates exceptional performance of nearly 100% on TDID and 97% on AUITD. In Stage II classification, the proposed model again attains excellent classification of close to 100% on TDID and 99% on AUITD.

data quality, descriptor, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2509.16382

Country: Europe (0.28)

Genre: Research Report > New Finding (0.34)

Industry: Health & Medicine > Therapeutic Area > Oncology > Thyroid Cancer (0.81)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Data Science > Data Quality > Data Transformation (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
(4 more...)

Add feedback

Compressed Sensing: Mathematical Foundations, Implementation, and Advanced Optimization Techniques

Stevenson, Shane, Sabagh, Maryam

arXiv.org Artificial IntelligenceSep-16-2025

Compressed sensing is a signal processing technique that allows for the reconstruction of a signal from a small set of measurements. The key idea behind compressed sensing is that many real-world signals are inherently sparse, meaning that they can be efficiently represented in a different space with only a few components compared to their original space representation. In this paper we will explore the mathematical formulation behind compressed sensing, its logic and pathologies, and apply compressed sensing to real world signals.

artificial intelligence, optimization problem, vector, (16 more...)

arXiv.org Artificial Intelligence

2509.1155

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.51)

Add feedback

Playing Atari Space Invaders with Sparse Cosine Optimized Policy Evolution

O'Connor, Jim, Nash, Jay B., Gezgin, Derin, Parker, Gary B.

arXiv.org Artificial IntelligenceAug-13-2025

Evolutionary approaches have previously been shown to be effective learning methods for a diverse set of domains. However, the domain of game-playing poses a particular challenge for evolutionary methods due to the inherently large state space of video games. As the size of the input state expands, the size of the policy must also increase in order to effectively learn the temporal patterns in the game space. Consequently, a larger policy must contain more trainable parameters, exponentially increasing the size of the search space. Any increase in search space is highly problematic for evolutionary methods, as increasing the number of trainable parameters is inversely correlated with convergence speed. To reduce the size of the input space while maintaining a meaningful representation of the original space, we introduce Sparse Cosine Optimized Policy Evolution (SCOPE). SCOPE utilizes the Discrete Cosine Transform (DCT) as a pseudo attention mechanism, transforming an input state into a coefficient matrix. By truncating and applying sparsification to this matrix, we reduce the dimensionality of the input space while retaining the highest energy features of the original input. We demonstrate the effectiveness of SCOPE as the policy for the Atari game Space Invaders. In this task, SCOPE with CMA-ES outperforms evolutionary methods that consider an unmodified input state, such as OpenAI-ES and HyperNEAT. SCOPE also outperforms simple reinforcement learning methods, such as DQN and A3C. SCOPE achieves this result through reducing the input size by 53% from 33,600 to 15,625 then using a bilinear affine mapping of sparse DCT coefficients to policy actions learned by the CMA-ES algorithm.

evolutionary algorithm, machine learning, reinforcement learning, (12 more...)

arXiv.org Artificial Intelligence

2508.08526

Country:

North America > United States (0.28)
North America > Canada > Alberta (0.28)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.35)

Add feedback

Fourier-VLM: Compressing Vision Tokens in the Frequency Domain for Large Vision-Language Models

Wang, Huanyu, Kai, Jushi, Bai, Haoli, Hou, Lu, Jiang, Bo, He, Ziwei, Lin, Zhouhan

arXiv.org Artificial IntelligenceAug-12-2025

Vision-Language Models (VLMs) typically replace the predefined image placeholder token () in textual instructions with visual features from an image encoder, forming the input to a backbone Large Language Model (LLM). However, the large number of vision tokens significantly increases the context length, leading to high computational overhead and inference latency. While previous efforts mitigate this by selecting only important visual features or leveraging learnable queries to reduce token count, they often compromise performance or introduce substantial extra costs. In response, we propose Fourier-VLM, a simple yet efficient method that compresses visual representations in the frequency domain. Our approach is motivated by the observation that vision features output from the vision encoder exhibit concentrated energy in low-frequency components. Leveraging this, we apply a low-pass filter to the vision features using a two-dimensional Discrete Cosine Transform (DCT). Notably, the DCT is efficiently computed via the Fast Fourier Transform (FFT) operator with a time complexity of $\mathcal{O}(n\log n)$, minimizing the extra computational cost while introducing no additional parameters. Extensive experiments across various image-based benchmarks demonstrate that Fourier-VLM achieves competitive performance with strong generalizability across both LLaVA and Qwen-VL architectures. Crucially, it reduce inference FLOPs by up to 83.8% and boots generation speed by 31.2% compared to LLaVA-v1.5, highlighting the superior efficiency and practicality.

data quality, large language model, natural language, (20 more...)

arXiv.org Artificial Intelligence

2508.06038

Country: Asia > China (0.15)

Genre: Research Report (0.44)

Technology:

Information Technology > Data Science > Data Quality > Data Transformation (0.90)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.71)
Information Technology > Artificial Intelligence > Vision > Image Understanding (0.56)

Add feedback

SCOPE for Hexapod Gait Generation

O'Connor, Jim, Nash, Jay B., Gezgin, Derin, Parker, Gary B.

arXiv.org Artificial IntelligenceJul-21-2025

Evolutionary methods have previously been shown to be an effective learning method for walking gaits on hexapod robots. However, the ability of these algorithms to evolve an effective policy rapidly degrades as the input space becomes more complex. This degradation is due to the exponential growth of the solution space, resulting from an increasing parameter count to handle a more complex input. In order to address this challenge, we introduce Sparse Cosine Optimized Policy Evolution (SCOPE). SCOPE utilizes the Discrete Cosine Transform (DCT) to learn directly from the feature coefficients of an input matrix. By truncating the coefficient matrix returned by the DCT, we can reduce the dimensionality of an input while retaining the highest energy features of the original input. We demonstrate the effectiveness of this method by using SCOPE to learn the gait of a hexapod robot. The hexapod controller is given a matrix input containing time-series information of previous poses, which are then transformed to gait parameters by an evolved policy. In this task, the addition of SCOPE to a reference algorithm achieves a 20% increase in efficacy. SCOPE achieves this result by reducing the total input size of the time-series pose data from 2700 to 54, a 98% decrease. Additionally, SCOPE is capable of compressing an input to any output shape, provided that each output dimension is no greater than the corresponding input dimension. This paper demonstrates that SCOPE is capable of significantly compressing the size of an input to an evolved controller, resulting in a statistically significant gain in efficacy.

artificial intelligence, evolutionary algorithm, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2507.13539

Country: North America > United States (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)

Add feedback

Efficient Transformations in Deep Learning Convolutional Neural Networks

Yilmaz, Berk, Harvey, Daniel Fidel, Dhuri, Prajit

arXiv.org Artificial IntelligenceJun-23-2025

This study investigates the integration of signal processing transformations -- Fast Fourier Transform (FFT), Walsh-Hadamard Transform (WHT), and Discrete Cosine Transform (DCT) -- within the ResNet50 convolutional neural network (CNN) model for image classification. The primary objective is to assess the trade-offs between computational efficiency, energy consumption, and classification accuracy during training and inference. Using the CIFAR-100 dataset (100 classes, 60,000 images), experiments demonstrated that incorporating WHT significantly reduced energy consumption while improving accuracy. Specifically, a baseline ResNet50 model achieved a testing accuracy of 66%, consuming an average of 25,606 kJ per model. In contrast, a modified ResNet50 incorporating WHT in the early convolutional layers achieved 74% accuracy, and an enhanced version with WHT applied to both early and late layers achieved 79% accuracy, with an average energy consumption of only 39 kJ per model. These results demonstrate the potential of WHT as a highly efficient and effective approach for energy-constrained CNN applications.

accuracy, artificial intelligence, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2506.16418

Country:

North America > United States (1.00)
Europe (0.67)

Genre: Research Report > New Finding (1.00)

Industry:

Energy (1.00)
Government > Regional Government > North America Government > United States Government (0.68)
Government > Space Agency (0.46)
Media > Photography (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

FAST: Efficient Action Tokenization for Vision-Language-Action Models

Pertsch, Karl, Stachowicz, Kyle, Ichter, Brian, Driess, Danny, Nair, Suraj, Vuong, Quan, Mees, Oier, Finn, Chelsea, Levine, Sergey

arXiv.org Artificial IntelligenceJan-16-2025

Autoregressive sequence models, such as Transformer-based vision-language action (VLA) policies, can be tremendously effective for capturing complex and generalizable robotic behaviors. However, such models require us to choose a tokenization of our continuous action signals, which determines how the discrete symbols predicted by the model map to continuous robot actions. We find that current approaches for robot action tokenization, based on simple per-dimension, per-timestep binning schemes, typically perform poorly when learning dexterous skills from high-frequency robot data. To address this challenge, we propose a new compression-based tokenization scheme for robot actions, based on the discrete cosine transform. Our tokenization approach, Frequency-space Action Sequence Tokenization (FAST), enables us to train autoregressive VLAs for highly dexterous and high-frequency tasks where standard discretization methods fail completely. Based on FAST, we release FAST+, a universal robot action tokenizer, trained on 1M real robot action trajectories. It can be used as a black-box tokenizer for a wide range of robot action sequences, with diverse action spaces and control frequencies. Finally, we show that, when combined with the pi0 VLA, our method can scale to training on 10k hours of robot data and match the performance of diffusion VLAs, while reducing training time by up to 5x.

dataset, tokenization, tokenizer, (13 more...)

arXiv.org Artificial Intelligence

2501.09747

Country:

Europe > Netherlands > South Holland > Delft (0.04)
Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Transportation > Air (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.66)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.48)
Information Technology > Artificial Intelligence > Robots > Manipulation (0.46)

Add feedback

Audios Don't Lie: Multi-Frequency Channel Attention Mechanism for Audio Deepfake Detection

Feng, Yangguang

arXiv.org Artificial IntelligenceDec-12-2024

With the rapid development of artificial intelligence technology, the application of deepfake technology in the audio field has gradually increased, resulting in a wide range of security risks. Especially in the financial and social security fields, the misuse of deepfake audios has raised serious concerns. To address this challenge, this study proposes an audio deepfake detection method based on multi-frequency channel attention mechanism (MFCA) and 2D discrete cosine transform (DCT). By processing the audio signal into a melspectrogram, using MobileNet V2 to extract deep features, and combining it with the MFCA module to weight different frequency channels in the audio signal, this method can effectively capture the fine-grained frequency domain features in the audio signal and enhance the Classification capability of fake audios. Experimental results show that compared with traditional methods, the model proposed in this study shows significant advantages in accuracy, precision,recall, F1 score and other indicators. Especially in complex audio scenarios, this method shows stronger robustness and generalization capabilities and provides a new idea for audio deepfake detection and has important practical application value. In the future, more advanced audio detection technologies and optimization strategies will be explored to further improve the accuracy and generalization capabilities of audio deepfake detection.

artificial intelligence, detection, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2412.09467

Country:

Asia > China > Hong Kong (0.05)
Asia > India > Uttar Pradesh (0.04)
Asia > China > Hubei Province > Wuhan (0.04)

Genre: Research Report > New Finding (0.69)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Parameter-Efficient Fine-Tuning via Selective Discrete Cosine Transform

Shen, Yixian, Bi, Qi, Huang, Jia-Hong, Zhu, Hongyi, Pathania, Anuj

arXiv.org Artificial IntelligenceOct-9-2024

In the era of large language models, parameter-efficient fine-tuning (PEFT) has been extensively studied. However, these approaches usually rely on the space domain, which encounters storage challenges especially when handling extensive adaptations or larger models. The frequency domain, in contrast, is more effective in compressing trainable parameters while maintaining the expressive capability. In this paper, we propose a novel Selective Discrete Cosine Transformation (sDCTFT) fine-tuning scheme to push this frontier. Its general idea is to exploit the superior energy compaction and decorrelation properties of DCT to improve both model efficiency and accuracy. Specifically, it projects the weight change from the low-rank adaptation into the discrete cosine space. Then, the weight change is partitioned over different levels of the discrete cosine spectrum, and the most critical frequency components in each partition are selected. Extensive experiments on four benchmark datasets demonstrate the superior accuracy, reduced computational cost, and lower storage requirements of the proposed method over the prior arts. For instance, when performing instruction tuning on the LLaMA3.1-8B model, sDCTFT outperforms LoRA with just 0.05M trainable parameters compared to LoRA's 38.2M, and surpasses FourierFT with 30\% less trainable parameters. The source code will be publicly available.

large language model, machine learning, sdctft, (19 more...)

arXiv.org Artificial Intelligence

2410.09103

Country:

Europe > Romania > Sud - Muntenia Development Region > Giurgiu County > Giurgiu (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)

Genre: Research Report > New Finding (0.68)

Industry:

Media > Film (0.68)
Leisure & Entertainment (0.47)
Consumer Products & Services (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Guarantees of confidentiality via Hammersley-Chapman-Robbins bounds

Chaudhuri, Kamalika, Guo, Chuan, van der Maaten, Laurens, Mahloujifar, Saeed, Tygert, Mark

arXiv.org Machine LearningApr-6-2024

Protecting privacy during inference with deep neural networks is possible by adding noise to the activations in the last layers prior to the final classifiers or other task-specific layers. The activations in such layers are known as "features" (or, less commonly, as "embeddings" or "feature embeddings"). The added noise helps prevent reconstruction of the inputs from the noisy features. Lower bounding the variance of every possible unbiased estimator of the inputs quantifies the confidentiality arising from such added noise. Convenient, computationally tractable bounds are available from classic inequalities of Hammersley and of Chapman and Robbins -- the HCR bounds. Numerical experiments indicate that the HCR bounds are on the precipice of being effectual for small neural nets with the data sets, "MNIST" and "CIFAR-10," which contain 10 classes each for image classification. The HCR bounds appear to be insufficient on their own to guarantee confidentiality of the inputs to inference with standard deep neural nets, "ResNet-18" and "Swin-T," pre-trained on the data set, "ImageNet-1000," which contains 1000 classes. Supplementing the addition of noise to features with other methods for providing confidentiality may be warranted in the case of ImageNet. In all cases, the results reported here limit consideration to amounts of added noise that incur little degradation in the accuracy of classification from the noisy features. Thus, the added noise enhances confidentiality without much reduction in the accuracy on the task of image classification.

iteration, perturbation, vector, (17 more...)

arXiv.org Machine Learning

2404.02866

Country: North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.86)

Add feedback